treatment arm
Equivalence Testing Under Privacy Constraints
Pareek, Savita, Insolia, Luca, Molinari, Roberto, Guerrier, Stéphane
Protecting individual privacy is essential across research domains, from socio-economic surveys to big-tech user data. This need is particularly acute in healthcare, where analyses often involve sensitive patient information. A typical example is comparing treatment efficacy across hospitals or ensuring consistency in diagnostic laboratory calibrations, both requiring privacy-preserving statistical procedures. However, standard equivalence testing procedures for differences in proportions or means, commonly used to assess average equivalence, can inadvertently disclose sensitive information. To address this problem, we develop differentially private equivalence testing procedures that rely on simulation-based calibration, as the finite-sample distribution is analytically intractable. Our approach introduces a unified framework, termed DP-TOST, for conducting differentially private equivalence testing of both means and proportions. Through numerical simulations and real-world applications, we demonstrate that the proposed method maintains type-I error control at the nominal level and achieves power comparable to its non-private counterpart as the privacy budget and/or sample size increases, while ensuring strong privacy guarantees. These findings establish a reliable and practical framework for privacy-preserving equivalence testing in high-stakes fields such as healthcare, among others.
Continuous-Time Learning of Probability Distributions: A Case Study in a Digital Trial of Young Children with Type 1 Diabetes
Álvarez-López, Antonio, Matabuena, Marcos
Understanding how biomarker distributions evolve over time is a central challenge in digital health and chronic disease monitoring. In diabetes, changes in the distribution of glucose measurements can reveal patterns of disease progression and treatment response that conventional summary measures miss. Motivated by a 26-week clinical trial comparing the closed-loop insulin delivery system t:slim X2 with standard therapy in children with type 1 diabetes, we propose a probabilistic framework to model the continuous-time evolution of time-indexed distributions using continuous glucose monitoring data (CGM) collected every five minutes. We represent the glucose distribution as a Gaussian mixture, with time-varying mixture weights governed by a neural ODE. We estimate the model parameter using a distribution-matching criterion based on the maximum mean discrepancy. The resulting framework is interpretable, computationally efficient, and sensitive to subtle temporal distributional changes. Applied to CGM trial data, the method detects treatment-related improvements in glucose dynamics that are difficult to capture with traditional analytical approaches.
Knowledge-based Graphical Method for Safety Signal Detection in Clinical Trials
Vandenhende, Francois, Georgiou, Anna, Georgiou, Michalis, Psaras, Theodoros, Karekla, Ellie, Hadjicosta, Elena
We present a graphical, knowledge-based method for reviewing treatment-emergent adverse events (AEs) in clinical trials. The approach enhances MedDRA by adding a hidden medical knowledge layer (Safeterm) that captures semantic relationships between terms in a 2-D map. Using this layer, AE Preferred Terms can be regrouped automatically into similarity clusters, and their association to the trial disease may be quantified. The Safeterm map is available online and connected to aggregated AE incidence tables from ClinicalTrials.gov. For signal detection, we compute treatment-specific disproportionality metrics using shrinkage incidence ratios. Cluster-level EBGM values are then derived through precision-weighted aggregation. Two visual outputs support interpretation: a semantic map showing AE incidence and an expectedness-versus-disproportionality plot for rapid signal detection. Applied to three legacy trials, the automated method clearly recovers all expected safety signals. Overall, augmenting MedDRA with a medical knowledge layer improves clarity, efficiency, and accuracy in AE interpretation for clinical trials.
Minimax and Bayes Optimal Best-arm Identification: Adaptive Experimental Design for Treatment Choice
This study investigates adaptive experimental design for treatment choice, also known as fixed-budget best-arm identification. We consider an adaptive procedure consisting of a treatment-allocation phase followed by a treatment-choice phase, and we design an adaptive experiment for this setup to efficiently identify the best treatment arm, defined as the one with the highest expected outcome. In our designed experiment, the treatment-allocation phase consists of two stages. The first stage is a pilot phase, where we allocate each treatment arm uniformly with equal proportions to eliminate clearly suboptimal arms and estimate outcome variances. In the second stage, we allocate treatment arms in proportion to the variances estimated in the first stage. After the treatment-allocation phase, the procedure enters the treatment-choice phase, where we choose the treatment arm with the highest sample mean as our estimate of the best treatment arm. We prove that this single design is simultaneously asymptotically minimax and Bayes optimal for the simple regret, with upper bounds that match our lower bounds up to exact constants. Therefore, our designed experiment achieves the sharp efficiency limits without requiring separate tuning for minimax and Bayesian objectives.
Admissibility of Completely Randomized Trials: A Large-Deviation Approach
Imbens, Guido, Qin, Chao, Wager, Stefan
When an experimenter has the option of running an adaptive trial, is it admissible to ignore this option and run a non-adaptive trial instead? We provide a negative answer to this question in the best-arm identification problem, where the experimenter aims to allocate measurement efforts judiciously to confidently deploy the most effective treatment arm. We find that, whenever there are at least three treatment arms, there exist simple adaptive designs that universally and strictly dominate non-adaptive completely randomized trials. This dominance is characterized by a notion called efficiency exponent, which quantifies a design's statistical efficiency when the experimental sample is large. Our analysis focuses on the class of batched arm elimination designs, which progressively eliminate underperforming arms at pre-specified batch intervals. We characterize simple sufficient conditions under which these designs universally and strictly dominate completely randomized trials. These results resolve the second open problem posed in Qin [2022].
Towards Regulatory-Confirmed Adaptive Clinical Trials: Machine Learning Opportunities and Solutions
Klein, Omer Noy, Hüyük, Alihan, Shamir, Ron, Shalit, Uri, van der Schaar, Mihaela
Randomized Controlled Trials (RCTs) are the gold standard for evaluating the effect of new medical treatments. Treatments must pass stringent regulatory conditions in order to be approved for widespread use, yet even after the regulatory barriers are crossed, real-world challenges might arise: Who should get the treatment? What is its true clinical utility? Are there discrepancies in the treatment effectiveness across diverse and under-served populations? We introduce two new objectives for future clinical trials that integrate regulatory constraints and treatment policy value for both the entire population and under-served populations, thus answering some of the questions above in advance. Designed to meet these objectives, we formulate Randomize First Augment Next (RFAN), a new framework for designing Phase III clinical trials. Our framework consists of a standard randomized component followed by an adaptive one, jointly meant to efficiently and safely acquire and assign patients into treatment arms during the trial. Then, we propose strategies for implementing RFAN based on causal, deep Bayesian active learning. Finally, we empirically evaluate the performance of our framework using synthetic and real-world semi-synthetic datasets.
Minimax Optimal Simple Regret in Two-Armed Best-Arm Identification
This study investigates an asymptotically minimax optimal algorithm in the two-armed fixed-budget best-arm identification (BAI) problem. Given two treatment arms, the objective is to identify the arm with the highest expected outcome through an adaptive experiment. We focus on the Neyman allocation, where treatment arms are allocated following the ratio of their outcome standard deviations. Our primary contribution is to prove the minimax optimality of the Neyman allocation for the simple regret, defined as the difference between the expected outcomes of the true best arm and the estimated best arm. Specifically, we first derive a minimax lower bound for the expected simple regret, which characterizes the worst-case performance achievable under the location-shift distributions, including Gaussian distributions. We then show that the simple regret of the Neyman allocation asymptotically matches this lower bound, including the constant term, not just the rate in terms of the sample size, under the worst-case distribution. Notably, our optimality result holds without imposing locality restrictions on the distribution, such as the local asymptotic normality. Furthermore, we demonstrate that the Neyman allocation reduces to the uniform allocation, i.e., the standard randomized controlled trial, under Bernoulli distributions.
Using Large Language Models to Generate Clinical Trial Tables and Figures
Yang, Yumeng, Krusche, Peter, Pantoja, Kristyn, Shi, Cheng, Ludmir, Ethan, Roberts, Kirk, Zhu, Gen
Tables, figures, and listings (TFLs) are essential tools for summarizing clinical trial data. Creation of TFLs for reporting activities is often a time-consuming task encountered routinely during the execution of clinical trials. This study explored the use of large language models (LLMs) to automate the generation of TFLs through prompt engineering and few-shot transfer learning. Using public clinical trial data in ADaM format, our results demonstrated that LLMs can efficiently generate TFLs with prompt instructions, showcasing their potential in this domain. Furthermore, we developed a conservational agent named "Clinical Trial TFL Generation Agent": An app that matches user queries to predefined prompts that produce customized programs to generate specific pre-defined TFLs. 1 Introduction In the pharmaceutical industry, submission of a clinical study report (CSR) is part of the drug approval process with health authorities.
Mathematical Programming For Adaptive Experiments
Che, Ethan, Jiang, Daniel R., Namkoong, Hongseok, Wang, Jimmy
Adaptive experimentation can significantly improve statistical power, but standard algorithms overlook important practical issues including batched and delayed feedback, personalization, non-stationarity, multiple objectives, and constraints. To address these issues, the current algorithm design paradigm crafts tailored methods for each problem instance. Since it is infeasible to devise novel algorithms for every real-world instance, practitioners often have to resort to suboptimal approximations that do not address all of their challenges. Moving away from developing bespoke algorithms for each setting, we present a mathematical programming view of adaptive experimentation that can flexibly incorporate a wide range of objectives, constraints, and statistical procedures. By formulating a dynamic program in the batched limit, our modeling framework enables the use of scalable optimization methods (e.g., SGD and auto-differentiation) to solve for treatment allocations. We evaluate our framework on benchmarks modeled after practical challenges such as non-stationarity, personalization, multi-objectives, and constraints. Unlike bespoke algorithms such as modified variants of Thomson sampling, our mathematical programming approach provides remarkably robust performance across instances.
AExGym: Benchmarks and Environments for Adaptive Experimentation
Wang, Jimmy, Che, Ethan, Jiang, Daniel R., Namkoong, Hongseok
Innovations across science and industry are evaluated using randomized trials (i.e., A/B tests). While simple and robust, such static designs are inefficient or infeasible for testing many hypotheses. Adaptive designs can greatly improve statistical power in theory, but they have seen limited adoption due to their fragility in practice. We present a benchmark for adaptive experimentation based on realworld datasets, highlighting prominent practical challenges to operationalizing adaptivity: non-stationarity, batched/delayed feedback, multiple outcomes and objectives, and external validity. Our benchmark aims to spur methodological development that puts practical performance (e.g., robustness) as a central concern, rather than mathematical guarantees on contrived instances. We release an opensource library, AExGym, which is designed with modularity and extensibility in mind to allow experimentation practitioners to develop and benchmark custom environments and algorithms.